feat: Built-in agent — LLM-powered AEO analyst with chat API#74
feat: Built-in agent — LLM-powered AEO analyst with chat API#74
Conversation
arberx
left a comment
There was a problem hiding this comment.
🤖 Automated Review Summary
Files reviewed: 21
Comments left: 13
Issues found:
- 🔴 Bug: 5
- 🟡 Security: 2
- 🟠 Performance: 1
- 🔵 Type Safety: 2
- 🟣 Testing: 1
- ⚪ Style: 1
- ⚪ Dead code: 1
Key findings
Bugs (fix before merge):
- Tool-call persistence ordering (
loop.ts~line 188) — assistant tool-call rows are stored aftertool.execute()runs. If execution throws, the DB ends up with atoolresult row but no matchingassistantrow, corrupting thread replay for Claude. - Empty
apiKeyfallback (server.ts~line 499) —apiKey ?? ''means a misconfigured provider silently constructs a working handler that returns 401 on every LLM call instead of returningundefinedto disable the agent. ApiClientwith undefinedapiUrl/apiKey(server.ts~line 503) — self-hosted instances withoutapiUrlset will get silent failures onrun_sweepand all GSC tools.- Orphaned messages on thread delete (
agent.ts~line 220) —ON DELETE CASCADEonly fires ifPRAGMA foreign_keys = ON, which is not guaranteed. - Unbounded
messagefield (agent.ts~line 158) — nomaxLengthon the message body; a single large payload can rack up LLM token costs.
Security:
dns.resolve6catch-all silently swallows errors (sitemap-parser.ts) — not exploitable but could incorrectly block IPv6-only hosts.- Missing message length limit (covered above under Bugs).
Testing gap:
- ~830 lines of new agent code ship with zero unit tests. The loop's history-windowing, maxSteps fallback, and JSON recovery paths are all critical and unverified.
Performance:
getTimelinehas an N+1 query pattern;getHistoryalready demonstrates the correct bulk-fetch approach.
What's done well ✅
- The SSRF hardening in
sitemap-parser.tsis thorough: DNS resolution, IPv4/IPv6 loopback, link-local, ULA, and IPv4-mapped IPv6 addresses are all covered, with tests. - History windowing (newest-N-ascending subquery) is the right approach and well-commented.
- Provider abstraction is clean — adding a new LLM is a one-function addition in
llm.ts. - Malformed JSON recovery in
convertToClaudeMessagesis a good defensive touch. - DB schema with
ON DELETE CASCADE+ composite index on(thread_id, created_at)is solid.
Overall assessment: NEEDS_WORK on the tool-call persistence bug and the ApiClient/apiKey issues before this is safe to merge.
This review was generated by an AI agent. Please verify all suggestions.
- Fix tool-call persistence ordering: persist assistant row before
tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a built-in AI agent that uses canonry's own tools to answer
AEO questions, run sweeps, and explain citation changes. No external
agent framework required — just the LLM provider already configured.
Architecture:
- Agent loop modeled after OpenClaw's pattern (LLM ↔ tool ↔ repeat)
- Uses existing provider API keys from canonry config
- Persistence in SQLite (same database, new tables)
- Provider priority: Claude > OpenAI > Gemini (configurable)
New files:
- packages/canonry/src/agent/ — core agent module
- loop.ts: LLM ↔ tool execution cycle
- llm.ts: provider-agnostic LLM layer (OpenAI, Claude, Gemini)
- tools.ts: canonry operations as LLM-callable functions
- store.ts: thread/message persistence (SQLite)
- prompt.ts: AEO analyst system prompt
- types.ts: shared type definitions
- packages/api-routes/src/agent.ts — REST API for chat
- packages/canonry/src/commands/agent.ts — CLI commands
CLI:
canonry agent ask <project> "message" — chat with the agent
canonry agent threads <project> — list threads
canonry agent thread <project> <id> — show thread history
API:
POST /api/v1/projects/:project/agent/threads — create thread
GET /api/v1/projects/:project/agent/threads — list threads
GET /api/v1/projects/:project/agent/threads/:id — get thread + messages
POST /api/v1/projects/:project/agent/threads/:id/messages — send message
DELETE /api/v1/projects/:project/agent/threads/:id — delete thread
Config:
agent:
provider: claude|openai|gemini (optional, auto-detects)
model: string (optional, uses provider default)
maxSteps: number (default: 10)
maxHistory: number (default: 30)
enabled: boolean (default: true if provider available)
Tools exposed to agent:
- get_status, run_sweep, get_evidence, get_timeline
- list_keywords, list_competitors, get_run_details
- get_gsc_performance, get_gsc_coverage, inspect_url
DB migration:
- agent_threads: conversation threads per project
- agent_messages: messages within threads (user/assistant/tool)
Closes #59
Fixes IDOR vulnerability where thread endpoints (get, send message, delete) accepted a :project param but never verified the thread belonged to that project. Now all three endpoints verify thread.projectId === project.id before allowing access. Addresses review comment #1 (Security - CRITICAL)
Wrap JSON.parse(toolCall.function.arguments) in try-catch to prevent crashes when LLMs return malformed JSON. On parse error, persist the error as a tool result and continue the agent loop instead of crashing. Addresses review comment #2 (Bug)
Replace dynamic imports of 'eq' and 'projects' table inside the message handler with static top-level imports to eliminate async overhead on every message. Addresses review comment #3 (Performance)
… layer Create AgentServices class that provides direct DB access for agent tools, eliminating the circular dependency where tools called the server's own HTTP API. Most read-only tools (get_status, get_evidence, get_timeline, list_keywords, list_competitors, get_run_details) now use direct DB calls via AgentServices. Write operations (run_sweep) and external integrations (GSC) still use HTTP for proper job orchestration and auth handling. Benefits: - Eliminates ~1-5ms HTTP localhost roundtrip per tool call - Removes startup timing dependency - Simplifies auth config Addresses review comment #4 (Architecture)
- P1: History windowing now returns newest N messages (was oldest N, causing long threads to drop the user's latest prompt) - P1: SSRF validation now blocks localhost, IPv6 loopback/private, and resolves hostnames to verify they don't point to internal IPs - P2: getRun() now requires projectName to prevent cross-project data access via known run IDs - P2: getHistory() now queries snapshots for all returned runs (was only querying the first run ID) - P2: convertToClaudeMessages() now handles malformed JSON in historical tool calls instead of crashing the thread Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix tool-call persistence ordering: persist assistant row before
tool.execute() so DB is never left with orphaned tool results
- Guard against empty apiKey: return undefined instead of silently
constructing a broken handler
- Fall back to localhost:{port} when apiUrl is not configured so
self-hosted instances can use HTTP-backed agent tools
- Explicitly delete agent_messages before thread deletion (don't
rely on PRAGMA foreign_keys = ON)
- Add maxLength: 8000 on message body schema (Fastify/Ajv enforcement)
- Fix N+1 in getTimeline: bulk-fetch all snapshots with inArray
- Remove dead claude entry from PROVIDER_ENDPOINTS (uses dedicated path)
- Clean up duplicate projects import alias in server.ts
- Narrow dns.resolve6 catch to ENODATA/ENOTFOUND only
- Add CHECK constraint on agent_messages.role column
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical fixes: - Fix history truncation splitting tool-call pairs: trim orphaned tool/assistant messages at the window boundary - Add per-thread concurrency guard (409 Conflict if thread is busy) - Fix get_status returning oldest 3 runs (slice(-3) → slice(0,3)) - Resolve LLM config from registry at call time instead of capturing stale API key at startup - Merge consecutive Claude tool results into single user message to avoid invalid same-role sequences Important fixes: - Add 20KB truncation cap on tool results to prevent blowing up LLM context window - Guard against empty toolCalls array causing silent spin - Add 90s timeout on all LLM fetch calls - Return structured error responses (502 for LLM errors) instead of generic 500s - Fix inconsistent return shape in getHistory (evidence → snapshots) - Add maxLength/enum validation on thread title and channel fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename the agent to "Aero" across CLI output and error messages - Add soul.md as the agent's identity/personality definition (checked into repo as the default, loaded from ~/.canonry/soul.md at runtime if the user wants to customize) - Add memory.md as persistent context that Aero accumulates — loaded from ~/.canonry/memory.md at runtime so users can prime the agent with project-specific knowledge - System prompt now composes: soul + project context + tools + memory - Built-in soul is embedded in prompt.ts so it works after tsup bundling - Agent remains fully optional: no background processes, only activates on explicit user request via CLI or API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now choose which LLM provider Aero uses per message:
- CLI: canonry agent ask <project> "msg" --provider claude
- API: POST /agent/threads/:id/messages { message, provider: "gemini" }
The provider field is optional — omitting it uses the default
(configured in agent.provider or auto-detected: claude > openai > gemini).
If the requested provider isn't configured, returns a clear error.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds the /aero route with a full chat interface for interacting with the built-in Aero agent. Includes project selector, provider/model selector, thread management (create/delete), message display with optimistic rendering, and a thinking animation during API calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… background processing Major Aero agent improvements: - Memory: get_memory/save_memory tools with pre-seeded domain knowledge (citation states, provider grounding mechanics, regression detection) - Startup sequence: auto-gathers context on new threads, responds naturally - System tools (opt-in): run_command, read_file, write_file, list_files, http_request — gated behind agent.systemTools config flag - Write tools: add/remove keywords, add/remove competitors, update_project - Background processing: send-message returns 202, UI polls for completion, agent work survives page navigation - Chat UI: markdown rendering, auto-expanding textarea, inline thread rename, relative dates, cleaner thread list, no page scroll - Claude API fix: bidirectional tool_use/tool_result validation prevents orphaned blocks from corrupting conversation history - CLI polling: agent ask now polls thread status instead of blocking - Remove unused footer, PATCH endpoint for thread rename, auto-titling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…quest size logging - Reduce default maxHistoryMessages from 30 to 20 (fewer stale messages) - Compress tool results older than 8 rows to 500 chars to prevent large get_evidence/get_memory results from inflating every subsequent request - Add stderr logging per request: ~N tokens (M chars, K messages) for debugging - Version 1.17.0 → 1.18.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…construction The previous two-pass validation approach had edge cases where the passes interacted in ways that still left orphaned tool_result blocks (causing Claude 400 errors at messages.0.content.0). New approach: state machine that walks the OpenAI-format messages once and only emits a tool call group (assistant+tool_use → user+tool_result) when ALL tool_use blocks have matching tool_result blocks. Incomplete groups from truncated history or server crashes are dropped entirely. Consecutive same-role messages are merged at the end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can now pick a specific model (e.g. Sonnet vs Opus) from the chat UI when a provider is selected. This avoids rate limit issues when the provider-level config is set to a model with low rate limits. Model priority: request model > agent config > provider config > default. Also syncs DEFAULT_MODELS in llm.ts with MODEL_REGISTRY from contracts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts: keep both indexing API (from main) and agent features (from feat/agent). Version stays at 1.19.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Unsure that this is necessary in Canonry at the moment. The real power of canonry is leveraged through a "openclaw" type agent that has full accessibility to the host operating system with a well constructed memory. This agent UI feels like any other agent UI. Not a fan so far. |
| process.stdout.write('Aero is thinking...') | ||
| } | ||
|
|
||
| await client.sendAgentMessage(project, threadId, message, opts?.provider) |
There was a problem hiding this comment.
Missing error handling on sendAgentMessage: If this call throws (e.g. network error, 4xx/5xx), the exception propagates uncaught and the polling loop below never runs — which is fine. However, if the server accepts the message but never transitions out of 'processing', the loop silently times out and returns an empty string (see line 62–80). A tighter pattern would be to await this, and then check the thread status before entering the poll loop.
| if (opts?.format !== 'json') process.stdout.write('.') | ||
| } | ||
|
|
||
| if (opts?.format !== 'json') console.log('\n') |
There was a problem hiding this comment.
Silent timeout — no error signal when the agent takes >3 minutes: When i reaches 120 (120 × 1500ms = 3 min), the loop exits without setting process.exitCode or printing an error. The caller receives an empty response string and exit code 0, which looks like success.
// After the loop:
if (!response) {
console.error('\nTimed out waiting for agent response (3 min).')
process.exitCode = 1
return
}| await validateSitemapUrl(url) | ||
| } | ||
|
|
||
| const res = await fetch(url) |
There was a problem hiding this comment.
DNS rebinding / TOCTOU: The DNS check in validateSitemapUrl (lines 66–88) happens before fetch(url). A malicious DNS server can return a public IP during validation, then switch to a private IP for the actual fetch request — this is a classic DNS rebinding attack.
Full mitigation requires resolving the hostname to an IP, asserting it's public, then connecting to that specific IP directly (e.g. by passing a custom agent to fetch that pins the resolved IP). The current approach significantly raises the bar vs. the old static regex check, but it is not bulletproof. Worth adding a comment documenting this known limitation so it's not mistaken for a complete fix.
| const services = new AgentServices(db) | ||
|
|
||
| // ApiClient is only needed for HTTP-backed tools (run_sweep, GSC). | ||
| // If apiUrl/apiKey aren't set (self-hosted), those tools will gracefully error. |
There was a problem hiding this comment.
Security: systemTools defaults to false — good. Consider adding a prominent warning when it's enabled.
When agent.systemTools: true, the built-in agent gains shell execution, file I/O, and HTTP request capabilities. Since the agent operates on user-supplied message content, an adversarial input could abuse these tools. At minimum, log a startup warning when systemTools is true so operators notice, and document clearly in the config schema that this is a dangerous option.
arberx
left a comment
There was a problem hiding this comment.
🤖 Automated Review — PR #74 (incremental: new commits since last review)
Summary: Solid agent foundation. The LLM loop, tool system, and CLI commands are well-structured. A few issues flagged inline:
| Severity | Finding | File |
|---|---|---|
| 🟠 Bug | Silent timeout — loop exits with exitCode=0 + empty response after 3 min |
commands/agent.ts:83 |
| 🟠 Bug | --wait polls indexingState (wrong field — means allowed, not indexed) |
commands/google.ts (already in main) |
| 🔴 Security | DNS rebinding TOCTOU in validateSitemapUrl |
sitemap-parser.ts:110 |
| 🟡 Robustness | sendAgentMessage failure not linked to poll-loop lifecycle |
commands/agent.ts:58 |
| 🟡 Security | systemTools: true silently grants shell/file/network to agent |
server.ts:509 |
The --wait / INDEXING_ALLOWED bug is already on main — worth a hotfix or follow-up issue independent of this PR.
Stack 2 pattern-prover: ports get_status, get_health, and get_timeline from PR #74's shape to pi-agent-core's AgentTool with @sinclair/typebox schemas. Locks the pattern before batching the remaining tools. Tools consume the existing ApiClient directly — no AgentServices shim. Aero uses the same API surface as any external agent, keeping the agent-first contract. Project name is bound via the ToolContext closure, not an LLM-visible argument — prevents the model from targeting the wrong project. - packages/canonry/src/agent/tools.ts — ToolContext, buildReadTools - packages/canonry/test/agent-tools.test.ts — 6 tests covering tool construction, default params, override params, and filter behavior - Bump to 2.0.2
…ords, list_competitors, get_run) Stack 4: brings the Aero read surface from 3 tools to 7. - get_insights — intelligence engine output (regressions/gains/opportunities with cause + recommendation metadata). Agents should query this instead of re-deriving conclusions from raw timeline rows. - list_keywords / list_competitors — tracking scope. - get_run — drill into a specific run by id. Particularly useful after get_status surfaces a failed run. Dropped get_evidence from the original PR #74 list — canonry's evidence command is just getTimeline() with a "cited" boolean convenience, so it would be redundant against the existing get_timeline tool. - Bump to 2.2.0
…shboard bar (#332) * chore(agent): remove OpenClaw gateway and bundled runtime Strips the OpenClaw-backed agent runtime ahead of the native in-process loop. Keeps the external-agent webhook contract (`canonry agent attach <project> --url <url>` / `agent detach`) so existing subscribers keep working; drops the setup/install/lifecycle surface entirely. BREAKING CHANGE: the following CLI commands are removed — `canonry agent setup`, `canonry agent start`, `canonry agent stop`, `canonry agent status`, `canonry agent reset`. `canonry agent attach` now requires `--url <webhook-url>` instead of deriving the URL from the former `config.agent.gatewayPort`. The `config.agent.{binary,profile,autoStart, gatewayPort}` fields are removed; only `config.agent.mode` remains (reserved until the native loop ships). Users with an orphaned `~/.openclaw-aero/` directory get a one-time boot-time warning. - Deletes agent-bootstrap.ts, agent-manager.ts, and their tests (agent-bootstrap.test, agent-manager.test, agent-config.test, agent-commands.test, agent-webhook.test). - Trims agent-webhook.ts to just the webhook event list consumed by `agent attach`. - Updates server.ts to drop AgentManager construction, auto-attach webhook hook, and graceful shutdown of the gateway. - Rewrites commands/agent.ts and cli-commands/agent.ts to expose only attach/detach; adds `--url` to attach. - Drops the OpenClaw integration test script. - Keeps the aero skill — it's target-agnostic (Claude Code, Codex, pi-agent-core all read the same skill content). Rewrites its memory-patterns reference to match the "canonry is source of truth, query don't duplicate" model. - Refreshes AGENTS.md files, README.md, and the canonry-setup CLI reference to reflect the new surface. - Major version bump: 1.48.4 → 2.0.0. * feat(agent): scaffold pi-agent-core integration Adds @mariozechner/pi-agent-core, @mariozechner/pi-ai, and @sinclair/typebox as direct deps on packages/canonry. Introduces packages/canonry/src/agent/pi-runtime.ts — a thin factory that constructs a pi-agent-core Agent scoped to a canonry project. Stack 1 of the native agent loop: proves the dep graph and import surface. Upcoming stacks wire convertToLlm, transformContext, event-driven persistence, tool definitions, and the beforeToolCall policy gate. - Bump to 2.0.1. * feat(agent): port 3 read tools to pi-agent-core shape Stack 2 pattern-prover: ports get_status, get_health, and get_timeline from PR #74's shape to pi-agent-core's AgentTool with @sinclair/typebox schemas. Locks the pattern before batching the remaining tools. Tools consume the existing ApiClient directly — no AgentServices shim. Aero uses the same API surface as any external agent, keeping the agent-first contract. Project name is bound via the ToolContext closure, not an LLM-visible argument — prevents the model from targeting the wrong project. - packages/canonry/src/agent/tools.ts — ToolContext, buildReadTools - packages/canonry/test/agent-tools.test.ts — 6 tests covering tool construction, default params, override params, and filter behavior - Bump to 2.0.2 * feat(agent): canonry agent ask — one-shot CLI backed by pi-agent-core Stack 3 of the native agent loop: wires a full Aero session and exposes it as `canonry agent ask <project> "<prompt>"`. First dogfoodable slice of the native loop. Session module (agent/session.ts) composes: - System prompt loaded from skills/aero/SKILL.md (bundled asset or repo-root fallback) - Pi-ai model resolution (default anthropic/claude-opus-4-7; falls back through openai and google based on which canonry API key is present) - Read tools from the stack 2 port (get_status, get_health, get_timeline) - getApiKey resolver that maps pi-ai provider names to the canonry config keys (anthropic→claude, google→gemini) CLI command (commands/agent-ask.ts) subscribes to AgentEvents and prints them — tool calls, tool results, assistant text. Supports --provider, --model, --format json. Integration tests use pi-ai's faux provider to exercise the full prompt→events→idle lifecycle without hitting a real LLM. 7 tests cover prompt loading, provider detection, end-to-end event sequence, and the no-provider-configured error path. - Bump to 2.1.0 — first shippable agent feature. * feat(agent): add z.ai (GLM) provider + env-var API key fallback Wires pi-ai's built-in `zai` provider into session.ts as a fourth SupportedAgentProvider option, with glm-5.1 as the default model. detectAgentProvider now considers zai alongside anthropic, openai, and google. Extends buildApiKeyResolver and detectAgentProvider with a pi-ai getEnvApiKey() fallback — if no canonry config entry is present, pulls from ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY / ZAI_API_KEY. Removes the need to persist ephemeral keys to ~/.canonry/config.yaml when dogfooding. CLI: `canonry agent ask --provider zai` is now valid. - Bump to 2.1.1 * feat(agent): batch-port remaining read tools (get_insights, list_keywords, list_competitors, get_run) Stack 4: brings the Aero read surface from 3 tools to 7. - get_insights — intelligence engine output (regressions/gains/opportunities with cause + recommendation metadata). Agents should query this instead of re-deriving conclusions from raw timeline rows. - list_keywords / list_competitors — tracking scope. - get_run — drill into a specific run by id. Particularly useful after get_status surfaces a failed run. Dropped get_evidence from the original PR #74 list — canonry's evidence command is just getTimeline() with a "cited" boolean convenience, so it would be redundant against the existing get_timeline tool. - Bump to 2.2.0 * feat(agent): write tools — run_sweep, dismiss_insight, add_keywords, add_competitors, update_schedule, attach_agent_webhook Stack 5: gives Aero the ability to act, not just analyze. The agent now closes the "want me to kick that off?" loop that the previous stacks ended on — run_sweep actually triggers the sweep, attach_agent_webhook wires external agents, and so on. Six additive-only write tools (no destructive surface yet — Aero can recommend removals in prose, not enact them): - run_sweep — POST /projects/:name/runs, optional provider filter - dismiss_insight — POST /intelligence/insights/:id/dismiss - add_keywords — POST /projects/:name/keywords (append semantic) - add_competitors — read + merge-dedup + PUT /competitors - update_schedule — PUT /projects/:name/schedule with cron xor preset - attach_agent_webhook — idempotent notification create, source='agent' Write tool calls surface via tool_execution_start events so the user sees exactly what fired. No confirmation gating in this stack — opt-in by running `canonry agent ask`. Confirmation policy lands alongside the UI stack when a chat surface exists to ask through. buildAllTools(ctx) combines reads + writes (13 tools total); session.ts now defaults to the full set. Callers can narrow to reads-only via the `tools` override. - Bump to 2.3.0 * feat(agent): persistent session registry — agent_sessions table + SessionRegistry Stack 6a of proactive Aero: the hybrid persistence layer underneath RunCoordinator → agent.followUp(). Live pi-agent-core Agent instances stay in memory per project; the durable state (transcript + queued follow-up messages + chosen provider/model) lives in agent_sessions. Schema: - agent_sessions table — one row per project (UNIQUE on project_id). Columns: system_prompt, model_provider, model_id, messages JSON, follow_up_queue JSON, created_at, updated_at. - Migration v38 added to packages/db. SessionRegistry API: - getOrCreate(projectName) — returns cached live Agent, hydrates from DB if persisted (draining follow_up_queue into the live followUp queue), or constructs + inserts a fresh row. - save(projectName) — persists state.messages back to the row. - queueFollowUp(projectName, msg) — forwards to live agent if cached; otherwise appends to the DB row's queue; buffers pre-session messages until the first getOrCreate creates the row. - evict(projectName) / clear() — drop live Agent(s); durable state untouched. No behavior change for `canonry agent ask` yet — the CLI still uses the per-invocation createAeroSession path. Stack 6b wires RunCoordinator to the registry so run.completed / insight.* events drive followUp; stack 6c drains queued follow-ups unprompted (the actual proactive moment). - docs/data-model.md gets an Agent section. - Tests cover insert, live hot path, rehydration-with-queue-drain, idle queue persistence, and the pre-session buffer. - Bump to 2.3.1 * feat(agent): proactive Aero — registry-driven CLI + RunCoordinator wake-up Completes stack 6: Aero now wakes up unprompted when runs complete, and CLI conversations thread across invocations via the persistent session registry. SessionRegistry refactor: - Drops reliance on pi's internal follow-up queue. The registry owns the pending buffer directly; drainNow / the next user prompt consumes and forwards via agent.prompt(). - queueFollowUp routes by session liveness (live → pending Map; idle → persisted queue). - drainNow (async, fire-and-forget safe) hydrates if needed, consumes pending, prompts the Agent, saves transcript back. - consumePending exposes the drain primitive so the CLI can bundle queued events into the next user prompt in a single turn. CLI switch (commands/agent-ask.ts): - Opens a DB connection + migrates, constructs SessionRegistry per invocation, getOrCreate the session, bundles any pending events in front of the user's prompt, runs, then saves. - Conversations now persist across `canonry agent ask` invocations — the next call sees prior transcript and any queued follow-ups. RunCoordinator wiring: - New OnAeroEvent callback added as a third subscriber (after intelligence + notifier). Receives runId / projectId / insight counts; returns Promise<void>. - server.ts constructs SessionRegistry + ApiClient directly from the loaded config (not loadConfig, to avoid test-ordering issues) and passes a callback that enqueues a "[system] Run X completed…" message and fires drainNow. Tests: registry suite refactored to match the new pending-buffer model (10 cases). Full workspace: 1002 tests green. - Bump to 2.4.0 — first stack where Aero acts without being prompted. * fix(agent): prevent duplicate follow-up + drain pending after each turn Two bugs surfaced by the first live proactive wake-up: 1. queueFollowUp wrote to BOTH the in-memory pending Map AND the DB follow_up_queue when no live session existed. Then drainNow's getOrCreate hydrated the DB queue INTO pending, producing a second copy. First end-to-end test showed the [system] "Run X completed" message appearing twice in the transcript. Fix: queueFollowUp now writes to exactly one sink — pending when live, DB when idle. Hydration is the only path that moves DB → pending. 2. No drain trigger between turns. If RunCoordinator fired while a CLI session was mid-turn, the queued message landed in pending but nothing drained it until someone called drainNow explicitly. Fix: getOrCreate now subscribes to agent_end on the live Agent and fires drainNow() whenever pending has items. Re-entrant safely — the drain calls prompt() which itself emits a new agent_end, so we stay single-threaded via pi's internal guards. Added a regression test covering the duplicate scenario (evict → queueFollowUp while idle → drainNow should yield exactly one copy of the queued message in the transcript). - Bump to 2.4.1 * feat(agent): SSE agent routes — transcript GET/DELETE + prompt stream Stack 7a of the dashboard UI: thin Fastify routes that wrap the SessionRegistry for the browser. Same abort/save lifecycle as the CLI path; envelope shape differs only by the stream_open / stream_close control frames that let the client distinguish a clean close from a network drop. Routes (all under `${apiPrefix}/projects/:name/agent/`): GET .../transcript — current rolling messages + model config DELETE .../transcript — reset the conversation POST .../prompt — { prompt, provider?, modelId? } → SSE of AgentEvent JSON lines SSE envelope: each frame is `data: <JSON>\n\n`. Pi-ai AgentEvents pass through verbatim; stream_open + stream_close bracket the conversation; error frames surface prompt failures without collapsing the stream. Client disconnect aborts the live Agent (`agent.abort()`), so navigating away mid-turn stops the LLM call instead of burning tokens to /dev/null. Registered in server.ts before apiRoutes so the shared base prefix + session registry are already in scope. Not wired from api-routes — Aero stays canonry-local until the cloud API explicitly opts in. - 1003 tests green - Bump to 2.5.0 * feat(web): Aero bottom command bar — dashboard surface for the native loop Stack 7b of the native agent loop: the browser-facing UI for Aero. Design (per project_aero_ui_direction memory): a fixed bottom bar, not a chat panel. Collapsed state shows "Ask Aero about <project>…"; clicking expands upward into a composer + rolling transcript. Only renders on project-scoped routes — hidden on overview, settings, setup, since there's no project context to ask about. Components: - apps/web/src/api-aero.ts — typed AeroEvent / AeroMessage shape, fetchAeroTranscript / resetAeroTranscript / promptAero. promptAero parses the SSE framing (data: JSON\n\n) and fires onEvent per frame; the caller passes an AbortSignal that translates to canceling the underlying fetch (server aborts the run on disconnect). - apps/web/src/components/shared/AeroBar.tsx — bar + AeroBarHost. Host reads router location and shows the bar iff the path matches /projects/<name>. Starter buttons (Status / Top insights / Last failed run / Schedule) fire canned prompts for zero-friction first use. Tool calls stream inline as emerald pills. [system] follow-up messages from RunCoordinator are filtered out of the transcript — they're internal plumbing, not user-facing. - Mounted in App.tsx RootLayout alongside Toaster. The in-turn streaming text is mirrored from message_update events so users see tokens as they arrive. On message_end the streaming buffer clears and the final message slots into the transcript via a fresh transcript fetch (resyncs in case of events that landed post-end). - 1003 tests green - Bump to 2.6.0 * docs(agent): sync AGENTS.md + README + canonry-cli skill for native Aero Stack 8 of the native agent loop: bring the user- and agent-facing docs into line with the shipped surface. The previous pass (commit 573cb96 during the OpenClaw rip-out) described a stripped-down, webhook-only world; that's no longer accurate — we've shipped the full Aero agent on pi-agent-core with 13 tools, proactive wake-ups, and a dashboard bar. Docs catch up. - AGENTS.md (root): Agent Layer section rewritten. CLI ref updated with `agent ask`. Key files listed (session.ts, session-registry.ts, tools.ts, agent-routes.ts, AeroBar.tsx). External-agent webhook path kept as a separate subsection for BYO-agent users. - packages/canonry/AGENTS.md: Key Files table now lists the five new agent-module files. Agent layer section split into built-in / external. - README.md: "Talking to Aero" section for the CLI; "Bringing your own agent" for webhooks. First bullet of Features promotes the built-in agent. Intro paragraph mentions Aero + pi-agent-core by name. - skills/canonry-setup/references/canonry-cli.md: ask command with provider table + env-var fallback order. Persistence behavior documented. Docs-only — no version bump. * fix(agent): move prompt-stream abort listener from request to response side Symptom: the SSE prompt endpoint produced assistant messages with empty content and stopReason=aborted. No LLM output surfaced in the dashboard bar. Root cause: `request.raw.on('close', ...)` fires as soon as the client finishes uploading the POST body — normal for every POST — not when the client disconnects from the response stream. So the abort handler was firing immediately after the prompt arrived, canceling the agent before pi-ai even started the LLM request. Fix: listen on `reply.raw.on('close')`. The response-side socket close fires only when the client actually drops the connection mid- stream, which is the signal we actually want. Verified via `curl -X POST … /agent/prompt`: full event sequence now fires (tool_execution_start/end, final message_end with real assistant text, clean stream_close). - Bump to 2.6.1 * fix(agent): code-review fixes — auth, listener leaks, reader abort Addresses findings from the pre-PR code review: BLOCKER: The /api/v1/projects/:name/agent/* routes were registered on the outer Fastify instance, bypassing the authPlugin that's scoped inside apiRoutes' encapsulated plugin. Anyone reaching the port could read transcripts, reset them, or drive Aero with the operator's LLM key. Fix: api-routes now exposes a `registerAuthenticatedRoutes` hook that runs inside the authenticated scope. Canonry passes `registerAgentRoutes` through this hook so Aero shares the bearer-key / session-cookie auth. Verified via `curl` — 401 without auth, 200 with the api key. Other fixes: - agent-routes.ts: `reply.raw.once('close')` instead of `.on('close')` so we don't retain the closure over `agent` across GCs. - session-registry.ts: `drainNow` now leaves pending messages in the queue when `isStreaming` is true (the agent_end drain hook will pick them up). Prevents event loss if two run.completed events land back-to-back. - commands/agent-ask.ts: sets `process.exitCode = 2` when a stream emits an assistant message with `stopReason === 'error'` or an errorMessage. Agents scripting against the CLI now get a non-zero exit on silent provider failures instead of a false success. - api-aero.ts: wire the caller's AbortSignal to `reader.cancel()` so aborting a prompt mid-stream unblocks `reader.read()` immediately. - AeroBar.tsx: use a stable key (`role:timestamp:index`) for message rows so React doesn't churn when the transcript re-fetch returns. - Deleted unused `src/agent/pi-runtime.ts` + its test — callers use `@mariozechner/pi-agent-core` directly. Verified live: - auth: 401 vs 200 on agent routes - full SSE turn: token-by-token streaming through message_update, clean stream_close, final assistant text - 1000 tests green - Bump to 2.6.2 * fix(agent): staff-review pass — CLI→HTTP, scope gate, 409 on concurrent, model override, proactive polling Addresses the five P1/P2 findings from the pre-merge review. P1-1: canonry agent ask was running its own local DB + SessionRegistry which broke against remote/shared canonry servers. Now the CLI posts to `/api/v1/projects/:name/agent/prompt` with `scope: 'all'` and parses the SSE stream — same session store as the dashboard, one execution path. Auth via the bearer api key; SIGINT cancels the in-flight fetch. P1-2: Two overlapping `/agent/prompt` requests on the same project shared the same per-project Agent instance — tool/message events cross-streamed, and either client closing could abort the other's run. Added a `state.isStreaming` guard that returns `409 AGENT_BUSY` with the new `agentBusy()` error factory. Verified: two parallel curls → first=200, second=409. P1-3: Dashboard sessions were getting the full read+write toolset with no confirmation UX, so a free-form prompt could trigger sweeps or mutate schedules from the command bar. Split `toolScope`: - Dashboard `/agent/prompt` defaults to `read-only` (7 tools). - CLI passes `scope: 'all'` to keep write tools available (13 tools). Tools swap per-request on the cached Agent; safe because we now 409 concurrent requests (the Agent is idle when tools swap). P2-4: `provider` / `modelId` flags were silently ignored once a session row existed — `getOrCreate` always rehydrated with the persisted model. Now explicit preferences override the persisted values AND are persisted back, so subsequent invocations use the new model unless the caller specifies otherwise. P2-5: Proactive turns from RunCoordinator wake-ups were invisible in the dashboard because the bar only fetched transcript on open or after a user prompt. Added a 15-second poll while the bar is open and no prompt is in flight. Server-initiated turns now surface within a poll cycle. Addressed open questions: - CLI transcript parity: added `canonry agent transcript <project>` and `canonry agent reset <project>` subcommands (GET + DELETE transcript). Previously only the dashboard could read/reset the transcript; agents scripting the CLI had no equivalent. - OpenAPI coverage: new `agent/*` endpoints documented via an opt-in `canonryLocalRouteCatalog` that only activates when the caller passes `includeCanonryLocal: true` (canonry does; shared api-routes doesn't, so the strict contract test still passes). - New error code: `AGENT_BUSY` (409) in packages/contracts/src/errors.ts - Added `scope?: 'all' | 'read-only'` to the prompt request body - 1000 tests green - Bump to 2.7.0 (minor: two new CLI commands, new error code, behavior change on dashboard tool scope) * chore: hold version at 2.0.0 — native-agent-loop ships as a single release Reverts the 2.0.1 → 2.7.0 churn introduced across the branch. Upstream main is still on 1.x; this entire feature will land as 2.0.0. * refactor(agent): consolidate provider/model maps into a single registry The three hand-written parallel maps — `SupportedAgentProvider` union, `DEFAULT_MODEL_IDS`, `CANONRY_PROVIDER_KEY` — plus the auto-detect priority list and the CLI's separate `AGENT_PROVIDERS` validation array were five places a new provider had to be added. And nothing tied them together, so a typo or missing entry was a silent bug at runtime. New shape in `packages/canonry/src/agent/providers.ts`: AGENT_PROVIDERS = { anthropic: { piAiProvider, label, canonryConfigKey, defaultModel, autoDetectPriority }, openai: {...}, google: {...}, zai: {...}, } as const satisfies Record<string, AgentProviderEntry> Everything downstream is derived: - `SupportedAgentProvider` is `keyof typeof AGENT_PROVIDERS`. - `AgentProviders` is the canonical enum constant (like RunKinds). - `listAgentProviders`, `agentProvidersByPriority`, `getAgentProvider`, `coerceAgentProvider`, `findByPiAiProvider`, `resolveApiKeyFor`, `resolveModelForProvider` all read from the single table. - `validateAgentProviderRegistry()` runs at the first session construction and throws if any default model is missing from the installed pi-ai catalog — surfaces registry drift early instead of at a user's first prompt. Adding a new provider (say Mistral or Bedrock) is now one row in the registry. CLI validation, auto-detect priority, env-var resolution, and model defaulting all update without further edits. Call sites updated: - session.ts uses the registry helpers; dropped the hand-rolled maps. - cli-commands/agent.ts uses `coerceAgentProvider` + `listAgentProviders` instead of maintaining its own parallel array. 12 new tests in agent-providers.test.ts cover registry invariants (unique priorities, every default resolves against pi-ai, every row has the required fields, coercion behavior, apiKey resolution). - 1012 tests green (was 1000, +12 for the registry suite) * fix(agent): reviewer pass 3 — acquireForTurn guard, CLI via ApiClient+CliError, hot-session model swap Addresses the P1/P2 findings from the staff re-review. P1-1: Busy-check now runs BEFORE any cached-Agent mutation. Introduced `SessionRegistry.acquireForTurn(name, prefs)`. It: 1. getOrCreate (pure — never mutates a cached Agent). 2. Throws `AGENT_BUSY` (409) if `state.isStreaming` is true. 3. Only after the busy guard passes does it align tool scope and optionally swap the model. Previously `getOrCreate` eagerly re-scoped tools on every lookup and THEN the route checked busy — so a dashboard read-only request could swap tools out from under an in-flight CLI `scope: 'all'` turn before getting its 409. Agent-routes now calls `acquireForTurn`; drainNow does too (catches AGENT_BUSY and leaves pending for the agent_end hook). Added a regression test that asserts `acquireForTurn` throws without mutating `state.tools` when the Agent is streaming. P1-2: CLI goes through `createApiClient()` + `CliError`. `agent-ask.ts` and `agent-transcript.ts` no longer build raw URLs and bypass ApiClient. Added `ApiClient.streamPost(path, body, signal)` that shares the existing probe + auth + structured-error path, returning a Response whose body the caller streams. Added `ApiClient.getAgentTranscript()` / `resetAgentTranscript()` for the transcript + reset subcommands. Errors now surface via `printCliError` with a proper `CliError.exitCode`, matching the repo's 0/1/2 contract. The ApiClient `/health` probe also re-engages for these calls, so reverse-proxied deployments without a local basePath config resolve correctly (previously we'd 404). P2-1: `--provider` / `--model` now affect hot cached sessions too. `acquireForTurn` aligns `state.model` on the cached Agent (not just on DB rehydration) when preferences change the provider or model id, and persists the new choice back to the `agent_sessions` row. Regression test verifies a cached agent's `state.model` changes when `provider: 'zai'` is passed to a session that was constructed with `provider: 'anthropic'`. P2-2: Version manifests intentionally reverted to 2.0.0. Earlier review reply mentioned 2.7.0; that was per-commit churn the repo owner asked me to roll back. The feature ships as a single 2.0.0 release — upstream main is still on 1.x. - +3 tests → 1015 workspace tests green * feat(web): Aero bar expand-to-fullscreen toggle Adds Maximize/Minimize icons in the bar header. Clicking promotes the bar to a near-fullscreen overlay (max-w-5xl, backdrop-blurred, textarea grows to 3 rows). Clicking again snaps back to the compact bottom bar. Escape key collapses expanded → compact → closed in sequence. Clicking the backdrop in expanded mode collapses back to compact (but doesn't close the session — the transcript persists). No logic changes. Pure presentation state. AeroBar still renders nothing outside project routes. * feat(web): Aero bar — typing indicator + rendered markdown Two UX fixes from the live dashboard pass. Typing indicator: three pulsing emerald dots labeled "Aero" appear in the transcript whenever the session is streaming but hasn't emitted any assistant text or tool pills yet. Covers: - the pre-first-token "thinking" moment after a user prompt, - the post-tool-result "analyzing" moment between tool rounds. Hides as soon as streaming text arrives or a tool-execution pill takes the spotlight. Respects prefers-reduced-motion. Markdown rendering: replaces the raw-text transcript with react-markdown. Headings, tables, lists, bold/italic, inline + block code, blockquotes, hr, and links all render with tailwind overrides that match the zinc/emerald dashboard palette — no browser-default blue underlines or black text. Links open in a new tab with noopener/noreferrer. Both the finalized assistant messages and the in-flight streamingText go through the same renderer, so tokens arrive formatted rather than as raw asterisks. - new dep: react-markdown in apps/web - new CSS: .aero-dot keyframes in styles.css * feat(agent): provider switch, tool trails, slash palette, copy-as-CLI in AeroBar Adds four dashboard agent-surface features so Aero feels like a real agentic console instead of a chat box: - Provider picker in the AeroBar header with per-turn override. New GET /projects/:name/agent/providers endpoint returns the full registry (provider id, model id, keySource, defaultProvider), backed by the new AgentProvidersResponse DTO in @ainyc/canonry-contracts. - Inline tool trails render each tool call as a collapsible card with running/ok/failed state, duration, and expandable args + result JSON — pulled from the SSE tool_execution_* frames. - Slash-command palette: `/` in the composer opens a Raycast-style menu of 8 curated prompts (status, insights, last-run, last-failed, run-sweep, schedule, keywords, competitors) with live filter and Arrow/Tab/Enter/Escape keybindings. - "Copy as CLI" on hover of user messages writes `canonry agent ask <project> "<prompt>"` to the clipboard with POSIX-safe quoting, honoring the agent-first CLI/API parity principle. - Context pills above the composer surface project/model/scope. A scope chip toggles read-only ↔ all tools; the all-tools state is amber to signal elevated access. The scope flows through promptAero → the prompt endpoint → SessionRegistry.acquireForTurn so the pi-agent-core tool surface is filtered per-turn. - 91 lines of tests covering provider registry, resolveApiKeyFor/Source, and buildAgentProvidersResponse under config vs env key sourcing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(agent): reviewer pass 4 — scope parity, providers CLI, OpenAPI Addresses three P2 findings against a18bb82: * Copy-as-CLI now threads the current UI scope through `canonry agent ask --scope`, so a pasted read-only turn can't quietly upgrade to write-capable. CLI default stays `all`; emit is omitted when UI is in `all` mode to keep pastes terse. New vitest covers the three paths. * `canonry agent providers <project>` — CLI parity for the dashboard provider picker, with `--format json` and the same `AgentProvidersResponse` shape the UI consumes. * `/api/v1/projects/{name}/agent/providers` is now listed in the canonry-local OpenAPI catalog (the route existed but wasn't documented). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(agent): review pass 5 — scope-preserving drain, ER diagram, --scope skill docs - Preserve the session's current tool scope during proactive drains in SessionRegistry.drainNow; fail-closed to 'read-only' when no scope is cached. Prevents a run.completed-triggered follow-up from silently escalating a read-only dashboard session to the full 13-tool write surface. - Add agent_sessions to the docs/data-model.md ER diagram (prose half was already present). - Document --scope all|read-only on canonry agent ask in the canonry-setup skill reference, including the dashboard's read-only default rationale. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(canonry): refresh bundled SPA asset hash Rebuild of the bundled dashboard produced by the local test run. index.html now references the new hashed bundle name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(agent): soul.md grounding + progressive-disclosure skill docs Compose Aero's system prompt from two files: skills/aero/soul.md (identity/values/voice/boundaries) + skills/aero/SKILL.md (task rules). Soul is prepended so identity frames judgment. Add two skill-doc tools for progressive disclosure of bundled reference playbooks — SKILL.md stays lightweight, playbooks load on demand: - list_skill_docs — scans references/*.md, parses description frontmatter - read_skill_doc({ slug }) — validates slug against manifest, returns content Skill-doc tools ride in every scope (read-only and all). Registry's alignScope now preserves them across scope realignment. Consolidate soul to a single source — delete the duplicate assets/agent-workspace/SOUL.md workspace-root copy. Built-in agent and external-agent workspace both reach the same skills/aero/soul.md via the copy-agent-assets build step. Version 2.0.2 → 2.1.0 (new tool surface). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(agent): align Aero provider IDs with sweep (anthropic→claude, google→gemini) Aero identified LLM backends as anthropic/openai/google/zai while the sweep side uses claude/gemini/openai. Operators saw two vocabularies for the same concept. Standardize on the sweep naming and expose a canonical ProviderIds enum in @ainyc/canonry-contracts that both surfaces reference. - New contracts/src/providers.ts: ProviderIds + AgentProviderIds + SweepProviderIds - Rename AGENT_PROVIDERS keys; drop canonryConfigKey (id === config key now) - DB migration v39 rewrites existing agent_sessions.modelProvider values - CLI --provider anthropic/google now rejected; use claude/gemini Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(api): source agent provider enum from canonical AGENT_PROVIDER_IDS The /agent/prompt OpenAPI spec still advertised the old ['anthropic','openai','google','zai'] enum after the rename to ['claude','openai','gemini','zai'], so spec-driven clients would send stale values and crash resolveModelForProvider. Importing AGENT_PROVIDER_IDS from @ainyc/canonry-contracts keeps the spec and the runtime validator in lock-step across future renames. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(agent): proactive drain on cold sessions + authoritative transcript reset Two reviewer findings on the native Aero loop: 1. drainNow returned early when the in-memory pending map was empty, so queueFollowUp → drainNow on a cold / post-restart session never woke the agent: the follow-up sat in the DB queue until a manual prompt hydrated the session. drainNow now checks both in-memory pending and the persisted follow_up_queue via hasPendingWork(); acquireForTurn → getOrCreate handles the hydration + DB→pending migration. 2. DELETE /agent/transcript wiped the DB row but only called evict(), leaving the in-memory pending buffer and scope cache intact. A system follow-up queued on a hot session could leak into the next prompt after a reset. New SessionRegistry.reset() clears live agent + pending + scopes; the route uses it in place of evict(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(server): inject <base href> unconditionally so SPA deep-links work Without an explicit basePath, the built index.html's relative `./assets/...` paths resolved against the current URL — visiting `/projects/:name` directly fetched `/projects/assets/index-*.js`, hit the SPA fallback, and received HTML where the browser expected JS, so React never mounted. Always emit `<base href="${basePath ?? '/'}">`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: revert version bump, keep at 2.0.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
canonry agent) for interactive terminal chat sessionsCommits
feat: built-in agent — LLM-powered AEO analyst with chat API— core agent loop, tools, store, routes, CLI command, DB migrationsfix(security): Add project ownership verification to thread endpointsfix(agent): Add error handling for malformed JSON in tool call argumentsperf(agent): Move dynamic imports to top-levelrefactor(agent): Replace circular HTTP self-calls with direct service layerstyle(agent): Remove dead code and unused typesfix(agent): fix 5 bugs in agent loop, SSRF validation, and services— history windowing, SSRF validation, project-scoped getRun, complete getHistory, Claude replay fixTest plan
canonry agentinteractive session against a project with runs🤖 Generated with Claude Code